Fossil Fuels & Economic Development

Rationale of our two questions

How do fossil fuels relate to social and economic development in different countries? * Can we identify groups of countries with similar oil production and GDP and if/how oil production impacts a country’s GDP? * Can we identify groups of countries with similar population and fossil fuel usage and if/how population size affects fossil fuel usage?

Data: * Population * GDP per Capita * Oil Production (92 countries) * Fossil fuel use (as % of total electricity generating capacity)

QUESTION 1 CLUSTERING CODE START

RESEARCH QUESTION: ‘Can we identify groups of countries with similar oil production and GDP and if/how oil production impacts a country’s GDP?’

Data used: * Oil Production: [from U.S. Energy Information Administration] For calendar year 2019, on a comparable best-estimate basis * GDP per Capita: [from Wikipedia] Converted at market exchange rates to current U.S. dollars, divided by the population for the same year

Prep to cluster OilProduction and GDP_pc Getting data from github and initializing:

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Removing rows where OilProduction == 0:

Clustering Part

Preparing to cluster oil production & GDP:

##           OilProduction GDP_pc
## Albania           22915   5372
## Algeria         1348361   3980
## Angola          1769615   3037
## Argentina        510560   9887
## Australia        289749  53825
## Austria           15161  50022

This is for replicability of results.

Partitioning Technique: PAM

  1. Apply function and indicate the amount of clusters required

  2. Clustering results

TABLE OF CLUSTERS:

## 
##  1  2  3  4 
## 58 14 17  3
  1. Evaluate Results

AVG SILHOUETTES:

## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
##   cluster size ave.sil.width
## 1       1   58          0.58
## 2       2   14          0.41
## 3       3   17          0.52
## 4       4    3          0.30

DECTECTING ANOMALIES:

##                        cluster neighbor sil_width
## Vietnam                      1        3 0.7014283
## Congo, Republic of the       1        3 0.7006665
## Papua New Guinea             1        3 0.6977134
## Ghana                        1        3 0.6972409
## Timor-Leste                  1        3 0.6957549
## Tunisia                      1        3 0.6955229

Requesting negative silhouettes:

##         cluster neighbor    sil_width
## Italy         2        3 -0.277837863
## Romania       3        1 -0.008107932

Hierarchizing/Agglomerative Technique: AGNES

  1. Apply function and indicate the amount of clusters required

  2. Clustering results

TABLE OF CLUSTERS:

## 
##  1  2  3  4 
## 59 14 16  3
  1. Evaluate results

DENDOGRAM:

AVG SILHOUETTES:

##   cluster size ave.sil.width
## 1       1   59          0.57
## 2       2   14          0.35
## 3       3   16          0.60
## 4       4    3          0.30

DECTECTING ANOMALIES:

##                        cluster neighbor sil_width
## Vietnam                      1        3 0.6977023
## Congo, Republic of the       1        3 0.6970225
## Egypt                        1        3 0.6946234
## Ghana                        1        3 0.6884544
## Papua New Guinea             1        3 0.6880476
## Timor-Leste                  1        3 0.6860947

Requesting negative silhouettes:

##         cluster neighbor   sil_width
## Romania       1        3 -0.02769694
## Kuwait        2        3 -0.25164833

Hierarchizing/Divisive Technique: DIANA

  1. Apply function and indicate the amount of clusters required

  2. Clustering results

TABLE OF CLUSTERS:

## 
##  1  2  3  4 
## 72 17  2  1
  1. Evaluate results

DENDOGRAM:

AVG SILHOUETTES:

##   cluster size ave.sil.width
## 1       1   72          0.72
## 2       2   17          0.50
## 3       3    2          0.70
## 4       4    1          0.00

DECTECTING ANOMALIES:

##           cluster neighbor sil_width
## Mongolia        1        2 0.8295996
## Bolivia         1        2 0.8294036
## Ukraine         1        2 0.8291026
## Guatemala       1        2 0.8285038
## Tunisia         1        2 0.8284611
## Georgia         1        2 0.8284214

Requesting negative silhouettes:

## [1] cluster   neighbor  sil_width
## <0 rows> (or 0-length row.names)

Density Based Clustering: DBSCAN

HOW MANY OUTLIERS? (0 identified outliers)

## DBSCAN clustering for 92 objects.
## Parameters: eps = 0.03, minPts = 4
## The clustering contains 3 cluster(s) and 20 noise points.
## 
##  0  1  2  3 
## 20 53 14  5 
## 
## Available fields: cluster, eps, minPts

Save coordinates to original data frame:

  • Plot PAM:

  • Plot AGNES:

  • Plot DIANA:

Compare results visually:

## Loading required package: magrittr

  • Plot DBSCAN:

Annotating:

Annotating Outliers:

BASED ON CLUSTERING, WE WILL USE DBSCAN. This cluster had high production &/OR high GDP (outliers).

##  [1] "Australia"            "Austria"              "Brazil"              
##  [4] "Canada"               "China"                "Denmark"             
##  [7] "Iran"                 "Iraq"                 "Italy"               
## [10] "Kazakhstan"           "Kuwait"               "Mexico"              
## [13] "Netherlands"          "Norway"               "Oman"                
## [16] "Russia"               "Saudi Arabia"         "United Arab Emirates"
## [19] "United Kingdom"       "United States"

This cluster had higher production & lower GDP.

##  [1] "Albania"                          
##  [2] "Algeria"                          
##  [3] "Angola"                           
##  [4] "Argentina"                        
##  [5] "Azerbaijan"                       
##  [6] "Bangladesh"                       
##  [7] "Belarus"                          
##  [8] "Belize"                           
##  [9] "Bolivia"                          
## [10] "Bulgaria"                         
## [11] "Burma"                            
## [12] "Cameroon"                         
## [13] "Chad"                             
## [14] "Colombia"                         
## [15] "Congo, Democratic Republic of the"
## [16] "Congo, Republic of the"           
## [17] "Ecuador"                          
## [18] "Egypt"                            
## [19] "Equatorial Guinea"                
## [20] "Gabon"                            
## [21] "Georgia"                          
## [22] "Ghana"                            
## [23] "Guatemala"                        
## [24] "India"                            
## [25] "Indonesia"                        
## [26] "Kyrgyzstan"                       
## [27] "Libya"                            
## [28] "Malaysia"                         
## [29] "Mauritania"                       
## [30] "Mongolia"                         
## [31] "Morocco"                          
## [32] "Niger"                            
## [33] "Nigeria"                          
## [34] "Pakistan"                         
## [35] "Papua New Guinea"                 
## [36] "Peru"                             
## [37] "Philippines"                      
## [38] "Romania"                          
## [39] "Serbia"                           
## [40] "South Africa"                     
## [41] "Sudan"                            
## [42] "Suriname"                         
## [43] "Tajikistan"                       
## [44] "Thailand"                         
## [45] "Timor-Leste"                      
## [46] "Tunisia"                          
## [47] "Turkey"                           
## [48] "Turkmenistan"                     
## [49] "Ukraine"                          
## [50] "Uzbekistan"                       
## [51] "Venezuela"                        
## [52] "Vietnam"                          
## [53] "Yemen"

This cluster had lower production & lower GDP.

##  [1] "Bahrain"             "Barbados"            "Brunei"             
##  [4] "Chile"               "Croatia"             "Czechia"            
##  [7] "Greece"              "Hungary"             "Lithuania"          
## [10] "Poland"              "Slovakia"            "Spain"              
## [13] "Taiwan"              "Trinidad and Tobago"

This cluster had lower production & higher GDP.

## [1] "France"      "Germany"     "Israel"      "Japan"       "New Zealand"

QUESTION 1 REGRESSION START

  • Hypothesis:

    • Model 1: GDP Per Capita ~ Oil Production

    • Model 2: GDP Per Capita ~ Oil Production + Continent

  • Continuous Outcome – – GDP Per Capita
  • Independent variable – – Oil Production
  • Control variable – – Continent
  • Rationale for hypothesis
    • Oil infrastructure supports GDP
    • OPEC // many economies heavily rely on oil income
    • Oil price wars (like now with Saudi Arabia and Russia) impact oil prices and thus GDP

Preparing to regress Oil Production & GDP

## 'data.frame':    92 obs. of  12 variables:
##  $ Country                : chr  "Albania" "Algeria" "Angola" "Arge"..
##  $ fossilFuel_PctTotalElec: num  0.05 0.96 0.34 0.69 0.72 0.25 0.84 ..
##  $ OilProduction          : num  22915 1348361 1769615 510560 289749..
##  $ Population             : int  2880917 43053054 31825295 44780677 ..
##  $ GDP_pc                 : int  5372 3980 3037 9887 53825 50022 468..
##  $ Continent              : Factor w/ 7 levels "Africa","Asia",..: 3..
##  $ pam                    : Factor w/ 4 levels "1","2","3","4": 1 1 ..
##  $ agn                    : Factor w/ 4 levels "1","2","3","4": 1 1 ..
##  $ dia                    : Factor w/ 4 levels "1","2","3","4": 1 1 ..
##  $ db                     : Factor w/ 4 levels "0","1","2","3": 2 2 ..
##  $ dim1                   : num  -0.0795 -0.0701 -0.0722 -0.0399 0.2..
##  $ dim2                   : num  0.0111 -0.0435 -0.063 -0.0116 0.159..

EXPLANATORY APPROACH

1.State the hypotheses

hypo1=formula(GDP_pc ~ OilProduction)
hypo2=formula(GDP_pc ~ OilProduction + Continent)

2.Save colums needed and varify data types

## 'data.frame':    92 obs. of  3 variables:
##  $ OilProduction: num  22915 1348361 1769615 510560 289749 ...
##  $ GDP_pc       : int  5372 3980 3037 9887 53825 50022 4689 25273 1905 18069 ...
##  $ Continent    : Factor w/ 7 levels "Africa","Asia",..: 3 1 1 7 6 3 4 2 2 5 ...

3.Compute regression models

4.Hypothesis results

  • First Hypothesis:
## 
## Call:
## glm(formula = hypo1, family = "gaussian", data = DataRegGauss)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -20272  -11232   -7177    5968   61852  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.337e+04  1.882e+03   7.103 2.76e-10 ***
## OilProduction 1.673e-03  7.318e-04   2.286   0.0246 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 280755365)
## 
##     Null deviance: 2.6736e+10  on 91  degrees of freedom
## Residual deviance: 2.5268e+10  on 90  degrees of freedom
## AIC: 2054.7
## 
## Number of Fisher Scoring iterations: 2
  • Second Hypothesis:
summary(gauss2)
## 
## Call:
## glm(formula = hypo2, family = "gaussian", data = DataRegGauss)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -29507   -8171   -1380    5466   47217  
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.227e+03  3.288e+03   0.677 0.500143    
## OilProduction           2.140e-03  6.498e-04   3.294 0.001448 ** 
## ContinentAsia           7.450e+03  4.231e+03   1.761 0.081897 .  
## ContinentEurope         2.500e+04  4.421e+03   5.656 2.09e-07 ***
## ContinentEurope/Asia   -2.640e+02  7.171e+03  -0.037 0.970725    
## ContinentNorth America  1.499e+04  6.409e+03   2.340 0.021675 *  
## ContinentOceania        2.990e+04  8.669e+03   3.449 0.000882 ***
## ContinentSouth America  3.519e+03  5.678e+03   0.620 0.537102    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 193147596)
## 
##     Null deviance: 2.6736e+10  on 91  degrees of freedom
## Residual deviance: 1.6224e+10  on 84  degrees of freedom
## AIC: 2026
## 
## Number of Fisher Scoring iterations: 2

5.Searching for a better model

## Analysis of Deviance Table
## 
## Model 1: GDP_pc ~ OilProduction
## Model 2: GDP_pc ~ OilProduction + Continent
##   Resid. Df Resid. Dev Df   Deviance Pr(>Chi)    
## 1        90 2.5268e+10                           
## 2        84 1.6224e+10  6 9043584826 2.03e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model for the Second hypothesis is chosen. This is the RSquared:

## [1] 0.3425815

6.Verify the situation of chosen model:

6.1. Linearity between dependent variable and predictors is assumed, then these dots should follow a linear and horizontal trend:

The linear trend is not obvious, and the distribution range goes wider when the predicted values increase. I’d like to say it represents the linearity between our variables in a certain level. Further research upon outliers are necessary.

6.2. Normality of residuals is assumed:

Visual exploration

Mathematical exploration:

## 
##  Shapiro-Wilk normality test
## 
## data:  gauss2$residuals
## W = 0.94464, p-value = 0.000681

6.3. Homoscedasticity is assumed, so check if residuals are spread equally along the ranges of predictors

Visual exploration:

Mathematical exploration:

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
##  studentized Breusch-Pagan test
## 
## data:  gauss2
## BP = 19.735, df = 7, p-value = 0.006171

6.4. We assume that there is no colinearity, that is, that the predictors are not correlated.

## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
##                   GVIF Df GVIF^(1/(2*Df))
## OilProduction 1.146212  1        1.070613
## Continent     1.146212  6        1.011437

6.5. Analize the effect of atypical values. Determine if outliers (points that are far from the rest, but still in the trend) or high-leverage points (far from the trend but close to the rest) are influential

Visual exploration:

Querying:

gaussInf=as.data.frame(influence.measures(gauss2)$is.inf)
gaussInf[gaussInf$cook.d,]
##  [1] dfb.1_   dfb.OlPr dfb.CntA dfb.CntE dfb.CE/A dfb.CnNA dfb.CntO
##  [8] dfb.CnSA dffit    cov.r    cook.d   hat     
## <0 rows> (or 0-length row.names)
  1. Finally, a nice summary plot of our work
## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.

PREDICTIVE APPROACH

  1. Splitting the data set
## Loading required package: lattice
  1. Regress with train data
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -29470   -9388   -1480    7635   44909  
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               2.204e+03  4.180e+03   0.527  0.59982    
## OilProduction             2.670e-03  8.371e-04   3.189  0.00221 ** 
## ContinentAsia             8.393e+03  5.127e+03   1.637  0.10657    
## ContinentEurope           2.646e+04  5.685e+03   4.655 1.68e-05 ***
## `ContinentEurope/Asia`   -1.646e+03  8.083e+03  -0.204  0.83925    
## `ContinentNorth America`  1.532e+04  7.708e+03   1.988  0.05109 .  
## ContinentOceania          2.986e+04  9.604e+03   3.109  0.00280 ** 
## `ContinentSouth America`  3.651e+03  7.890e+03   0.463  0.64509    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 224537488)
## 
##     Null deviance: 2.3585e+10  on 71  degrees of freedom
## Residual deviance: 1.4370e+10  on 64  degrees of freedom
## AIC: 1598.4
## 
## Number of Fisher Scoring iterations: 2
  1. Evaluate performance
##         RMSE     Rsquared          MAE 
## 10322.575720     0.536489  7547.290591

QUESTION 2 CLUSTERING START

RESEARCH QUESTION: ‘Can we identify groups of countries with similar population and fossil fuel usage and if/how population size affects fossil fuel usage?’

Data used: * fossilFuel_PctTotalElec: [from CIA World Factbook] percentage of total electricity generating capacity that comes from fossil fuels * Population: [UN Dept of Economic and Social Affairs] World population estimates

Prep to cluster fossilFuel_PctTotalElec and Population

##           fossilFuel_PctTotalElec Population
## Albania                      0.05    2880917
## Algeria                      0.96   43053054
## Angola                       0.34   31825295
## Argentina                    0.69   44780677
## Australia                    0.72   25203198
## Austria                      0.25    8955102

Set random seed for replicability of results:

Setting distance matrix:

Defining number of clusters for each method (NumCluster = 5) Clustering via pam method:

Adding pam results to original DF (DFnew1)

REPORT: Table of Cluster:

## 
##  1  2  3  4  5 
## 14 26 30 20  2

REPORT: Evaluate Results:

##   cluster size ave.sil.width
## 1       1   14          0.38
## 2       2   26          0.63
## 3       3   30          0.36
## 4       4   20          0.57
## 5       5    2          0.86

REPORT: Detecting Anomalies

Saving individual silhouettes

##                                   cluster neighbor sil_width
## Tajikistan                              1        4 0.6090716
## Albania                                 1        4 0.6060691
## Norway                                  1        4 0.5978116
## Timor-Leste                             1        4 0.5667970
## Congo, Democratic Republic of the       1        4 0.5525901
## France                                  1        4 0.5468275

Requesting negative silhouettes:

##         cluster neighbor  sil_width
## Angola        1        4 -0.2800818
## Georgia       1        4 -0.3377824
## Ghana         3        4 -0.1223473

Cluster via agnes method; indicate number of clusters (NumCluster):

Adding agn results to original DF (DFnew1)

REPORT: Table of clusters:

## 
##  1  2  3  4  5 
## 12 19 21 38  2

Evaluating results:

REPORT: Average silhouettes

##   cluster size ave.sil.width
## 1       1   12          0.47
## 2       2   19          0.79
## 3       3   21          0.60
## 4       4   38          0.23
## 5       5    2          0.87

REPORT: Detecting anomalies

##                                   cluster neighbor sil_width
## Tajikistan                              1        3 0.6486823
## Albania                                 1        3 0.6461141
## Norway                                  1        3 0.6392016
## Timor-Leste                             1        3 0.6056001
## Congo, Democratic Republic of the       1        3 0.5963339
## France                                  1        3 0.5571241

Requesting negative silhouettes:

##              cluster neighbor   sil_width
## Colombia           1        3 -0.03279474
## Indonesia          4        2 -0.01212078
## Iran               4        2 -0.03057378
## Chile              4        3 -0.06866262
## Azerbaijan         4        2 -0.14697731
## South Africa       4        2 -0.15226216
## Ghana              4        3 -0.17299110
## Greece             4        3 -0.27237486
## Uzbekistan         4        2 -0.28525827
## Kazakhstan         4        2 -0.31159271
## Mongolia           4        2 -0.41514761

Cluster via diana method; indicate number of clusters (NumCluster):

Adding diana results to original DF (DFnew1):

REPORT: Table of clusters

## 
##  1  2  3  4  5 
## 12 26 23 29  2

REPORT: Dendrogram

REPORT: Average silhouettes

##   cluster size ave.sil.width
## 1       1   12          0.49
## 2       2   26          0.62
## 3       3   23          0.52
## 4       4   29          0.39
## 5       5    2          0.86

REPORT: Detecting anomalies:

##                                   cluster neighbor sil_width
## Tajikistan                              1        3 0.6564827
## Albania                                 1        3 0.6539019
## Norway                                  1        3 0.6467797
## Timor-Leste                             1        3 0.6133818
## Congo, Democratic Republic of the       1        3 0.6057145
## France                                  1        3 0.5720875

Requesting negative silhouettes:

## [1] cluster   neighbor  sil_width
## <0 rows> (or 0-length row.names)

Cluster via DBSCAN method; indicate minimum neighbors (4):

Setting distance (epsilon):

REPORT: Number of clusters and outliers produced

## DBSCAN clustering for 92 objects.
## Parameters: eps = 0.03, minPts = 4
## The clustering contains 4 cluster(s) and 11 noise points.
## 
##  0  1  2  3  4 
## 11  4 52  4 21 
## 
## Available fields: cluster, eps, minPts

Saving results:

Comparing clusters

Prepare a bidimensional map:

View bidimensional map:

Results from pam:

Results from agnes:

Results from diana:

Compare visually:

Viewing pam, agnes, and diana plots side by side

Plot results from DBSCAN:

Annotating graph with country names:

Annotating just the outlier countries:

CHOOSING DIANA METHOD DUE TO HAVING ZERO NEGATIVE SILHOUETTES。

QUESTION 2 REGRESSION CODE START

  • Hypothesis:
    • Model 1: FF ~ Population
    • Model 2: FF ~ Population + Developed
  • Method:
    • Binary Outcome – – FF usage (Median percentage FF use of total electricity capacity)
  • Control variable – – Developed (Median GDP per capita)
  • Independent variable – – Population
  • Rationale for hypothesis:
    • Larger populations would exhibit higher fossil fuel usage as a percent of total electricity capacity

Changing dtype for population:

##                 Country fossilFuel_PctTotalElec           OilProduction 
##             "character"               "numeric"               "numeric" 
##              Population                  GDP_pc 
##               "numeric"               "integer"

Changing dtype for GDP_pc(gdp):

##                 Country fossilFuel_PctTotalElec           OilProduction 
##             "character"               "numeric"               "numeric" 
##              Population                  GDP_pc 
##               "numeric"               "numeric"

Filtering out non-oil producing countries & creating new DF (teamnew):

##                               Country fossilFuel_PctTotalElec
## 2                             Albania                    0.05
## 3                             Algeria                    0.96
## 4                              Angola                    0.34
## 6                           Argentina                    0.69
## 9                           Australia                    0.72
## 10                            Austria                    0.25
## 11                         Azerbaijan                    0.84
## 13                            Bahrain                    1.00
## 14                         Bangladesh                    0.97
## 15                           Barbados                    0.93
## 16                            Belarus                    0.96
## 18                             Belize                    0.51
## 21                            Bolivia                    0.76
## 24                             Brazil                    0.17
## 25                             Brunei                    1.00
## 26                           Bulgaria                    0.39
## 28                              Burma                    0.39
## 32                           Cameroon                    0.52
## 33                             Canada                    0.23
## 35                               Chad                    0.98
## 36                              Chile                    0.59
## 37                              China                    0.62
## 38                           Colombia                    0.29
## 40  Congo, Democratic Republic of the                    0.02
## 41             Congo, Republic of the                    0.64
## 43                            Croatia                    0.45
## 45                            Czechia                    0.60
## 46                            Denmark                    0.46
## 50                            Ecuador                    0.43
## 51                              Egypt                    0.91
## 53                  Equatorial Guinea                    0.61
## 60                             France                    0.17
## 61                              Gabon                    0.51
## 63                            Georgia                    0.35
## 64                            Germany                    0.41
## 65                              Ghana                    0.58
## 66                             Greece                    0.57
## 68                          Guatemala                    0.41
## 75                            Hungary                    0.64
## 77                              India                    0.71
## 78                          Indonesia                    0.85
## 79                               Iran                    0.84
## 80                               Iraq                    0.91
## 82                             Israel                    0.95
## 83                              Italy                    0.54
## 85                              Japan                    0.71
## 87                         Kazakhstan                    0.86
## 91                             Kuwait                    1.00
## 92                         Kyrgyzstan                    0.24
## 98                              Libya                    1.00
## 99                          Lithuania                    0.73
## 104                          Malaysia                    0.78
## 109                        Mauritania                    0.65
## 111                            Mexico                    0.71
## 114                          Mongolia                    0.87
## 116                           Morocco                    0.68
## 121                       Netherlands                    0.75
## 122                       New Zealand                    0.23
## 124                             Niger                    0.95
## 125                           Nigeria                    0.80
## 127                            Norway                    0.03
## 128                              Oman                    1.00
## 129                          Pakistan                    0.62
## 131                  Papua New Guinea                    0.63
## 133                              Peru                    0.61
## 134                       Philippines                    0.67
## 135                            Poland                    0.79
## 139                           Romania                    0.47
## 140                            Russia                    0.68
## 147                      Saudi Arabia                    1.00
## 149                            Serbia                    0.65
## 153                          Slovakia                    0.36
## 156                      South Africa                    0.85
## 158                             Spain                    0.47
## 160                             Sudan                    0.44
## 161                          Suriname                    0.61
## 164                            Taiwan                    0.79
## 165                        Tajikistan                    0.06
## 167                          Thailand                    0.76
## 168                       Timor-Leste                    0.00
## 171               Trinidad and Tobago                    1.00
## 172                           Tunisia                    0.94
## 173                            Turkey                    0.53
## 174                      Turkmenistan                    1.00
## 177                           Ukraine                    0.65
## 178              United Arab Emirates                    0.99
## 179                    United Kingdom                    0.50
## 180                     United States                    0.70
## 182                        Uzbekistan                    0.86
## 184                         Venezuela                    0.51
## 185                           Vietnam                    0.56
## 186                             Yemen                    0.79
##     OilProduction Population GDP_pc
## 2           22915    2880917   5372
## 3         1348361   43053054   3980
## 4         1769615   31825295   3037
## 6          510560   44780677   9887
## 9          289749   25203198  53825
## 10          15161    8955102  50022
## 11         833538   10047718   4689
## 13          40000    1641172  25273
## 14           4189  163046161   1905
## 15           1000     287025  18069
## 16          25000    9452411   6603
## 18           2000     390353   4925
## 21          58077   11513100   3670
## 24        2515459  211049527   8796
## 25         109117     433285  27871
## 26           1000    7000119   9518
## 28          15000   54045420   1244
## 32          93205   25876380   1514
## 33        3662694   37411047  46212
## 35         110156   15946876    861
## 36           4423   18952038  15399
## 37        3980650 1433783686  10098
## 38         897784   50339443   6508
## 40          20000   86790567    500
## 41         308363    5380508   2534
## 43          13582    4130304  14949
## 45           2333   10689209  23213
## 46         140637    5771876  59795
## 50         548421   17373662   6249
## 51         490000  100388073   3046
## 53         227000    1355986   8927
## 60          16418   65129728  41760
## 61         210820    2172579   8112
## 63            400    3996765   4289
## 64          46839   83517045  46563
## 65         100549   28833629   2223
## 66           3172   10473455  19974
## 68           8977   17581472   4616
## 75          13833    9684679  17463
## 77         715459 1366417754   2171
## 78         833667  270625568   4163
## 79        3990956   82913906   5506
## 80        4451516   39309783   5738
## 82            390    8519377  42823
## 83          70675   60550075  32946
## 85           3918  126860301  40846
## 87        1595199   18551427   9139
## 91        2923825    4207083  29266
## 92           1000    6415850   1292
## 98        1003000    6777452   5019
## 99           2000    2759627  19266
## 104        661240   31949777  11136
## 109          5000    4525696   1392
## 111       2186877  127575529  10118
## 114         23426    3225167   4132
## 116           160   36471769   3345
## 121         18087   17097130  52367
## 122         35574    4783063  40634
## 124         13000   23310715    405
## 125       1999885  200963599   2222
## 127       1647975    5378857  77975
## 128       1006841    4974986  17791
## 129         80000  216565318   1388
## 131         56667    8776109   2742
## 133         40266   32510453   7046
## 134         20000  108116615   3294
## 135         20104   37887768  14901
## 139        504000   19364557  12482
## 140      10800000  145872256  11162
## 147      12000000   34268528  22865
## 149         20000    8772235   7397
## 153           200    5457013  19547
## 156          2000   58558270   6100
## 158          2667   46736776  29961
## 160        255000   42813238    714
## 161         17000     581372   6310
## 164           196   23773876  24827
## 165           180    9321018    877
## 167        257525   69037513   7791
## 168         60661    1293119   2262
## 171         60090    1394973  16365
## 172         48757   11694719   3287
## 173         49497   83429615   8957
## 174        230779    5942089   7816
## 177         31989   43993638   3592
## 178       3106077    9770529  37749
## 179        939760   67530172  41030
## 180      15043000  329064917  65111
## 182         52913   32981716   1831
## 184       2276967   28515829   2547
## 185        301850   96462106   2740
## 186         22000   29161922    943

Converting USDollar to a factor variable

Calling new variable ‘Developed’

Converting fossilFuel_PctTotalElec to a factor variable

Calling new variable ‘FF’

Checking dtypes:

## 'data.frame':    92 obs. of  7 variables:
##  $ Country                : chr  "Albania" "Algeria" "Angola" "Arge"..
##  $ fossilFuel_PctTotalElec: num  0.05 0.96 0.34 0.69 0.72 0.25 0.84 ..
##  $ OilProduction          : num  22915 1348361 1769615 510560 289749..
##  $ Population             : num  2880917 43053054 31825295 44780677 ..
##  $ GDP_pc                 : num  5372 3980 3037 9887 53825 ...
##  $ Developed              : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 ..
##  $ FF                     : Factor w/ 2 levels "0","1": 1 2 1 2 2 1 ..

Defining ‘Population’ as independent variable:

Defining columns needed:

Verify dtypes for colsNeededDico:

## 'data.frame':    92 obs. of  3 variables:
##  $ FF        : Factor w/ 2 levels "0","1": 1 2 1 2 2 1 2 2 2 2 ...
##  $ Population: num  2880917 43053054 31825295 44780677 25203198 ...
##  $ Developed : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 1 2 1 2 ...

Create subset

Rename indexes by country

Define & compute regression models

Results of hypo3:

At p-value of 0.634, this model is not statistically significant

## 
## Call:
## glm(formula = hypo3, family = "binomial", data = DataRegLogis)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.436  -1.129  -1.126   1.223   1.230  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.231e-01  2.217e-01  -0.555    0.579
## Population   4.972e-10  1.045e-09   0.476    0.634
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 127.37  on 91  degrees of freedom
## Residual deviance: 127.13  on 90  degrees of freedom
## AIC: 131.13
## 
## Number of Fisher Scoring iterations: 4

Results of hypo4:

At p-values of 0.631 and 0.673, this model also is not statistically significant:

## 
## Call:
## glm(formula = hypo4, family = "binomial", data = DataRegLogis)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.479  -1.141  -1.089   1.192   1.268  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.120e-01  3.063e-01  -0.692    0.489
## Population   5.028e-10  1.048e-09   0.480    0.631
## Developed1   1.768e-01  4.184e-01   0.422    0.673
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 127.37  on 91  degrees of freedom
## Residual deviance: 126.95  on 89  degrees of freedom
## AIC: 132.95
## 
## Number of Fisher Scoring iterations: 4

Analysis of variance between models:

## Analysis of Deviance Table
## 
## Model 1: FF ~ Population
## Model 2: FF ~ Population + Developed
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        90     127.13                     
## 2        89     126.95  1  0.17869   0.6725

Recommendations

First Question

Oil production could be an important component of GDP, but higher oil production rate does not lead to higher GDP. If we want to evaluate the relationship between GDP and oil production, we also need to know what is the percentage of the GDP generated by oil production. Same level variables are more easy to be compared Too many countries that their oil production is close to zero Try other control variables like export/import Higher oil production does not lead to higher GDP necessarily

Second Question

Neither model is statistically significant; no further analysis required.

Recommendations for future analysis of question #2 include: Incorporate country-specific income levels as an additional variable Remove major outliers from sample population *Use actual fossil fuel usage data in lieu of ratios